-
Notifications
You must be signed in to change notification settings - Fork 23
implement dpnp.piecewise
#2550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
implement dpnp.piecewise
#2550
Conversation
View rendered docs @ https://intelpython.github.io/dpnp/pull/2550/index.html |
Array API standard conformance tests for dpnp=0.19.0dev3=py313h509198e_10 ran successfully. |
if isinstance(func, dpnp.ndarray): | ||
func = func.astype(x.dtype) | ||
|
||
# TODO: possibly can use func.item() to make sure that func is always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use dpnp.where
here instead? Or are there any limitations or performance concerns against that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using dpnp.where
instead of kernel is ~2x slower
import numpy, dpnp
a = numpy.linspace(-5, 5, 4096*4096)
%timeit numpy.piecewise(a, [a < 0, a >= 0], [-1, 1])
# 91.4 ms ± 148 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
ia = dpnp.array(a)
%timeit res = dpnp.piecewise(ia, [ia < 0, ia >= 0], [-1, 1]); res.sycl_queue.wait()
# 1.64 ms ± 4 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 3.06 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) # using dpnp.where
But it keeps the code simple and supports broadcasting
To improve performance, you can try to avoid dpnp.zeros_like()
and dpnp.full()
in condlen == funclen
and condlen + 1 == funclen
cases.
Instead, create one def_value
, allocate result once dpnp.empty_like(...)
, and update it in-place with dpnp.where(... out=result)
in the loop using any flags to detect the first iteration
First call - dpnp.where(condition, func, def_value, out=result)
to fill non-matching elements with def_value
Other calls - dpnp.where(condition, func, result, out=result) to keep existing values and overwrite only where condition is True.
Then I get on PVC for float32
# 5.97 ms ± 109 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) # kernel
# vs
# 6.47 ms ± 8.89 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) # optimized where
if isinstance(condlist, dpnp.ndarray) and condlist.ndim in [0, 1]: | ||
condlist = [condlist] | ||
elif dpnp.isscalar(condlist) or ( | ||
dpnp.isscalar(condlist[0]) and x.ndim != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no check that condlist
is exactly a list. Do we need that or it is assuming to accept any sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, condlist
can be a list, tuple or ndarray in both NumPy and CuPy. And dpnp
follows the same practice.
import numpy
a = numpy.linspace(-2.5, 2.5, 6)
numpy.piecewise(a, [a < 0, a >= 0], [-1, 1]) # condlist is list
# array([-1., -1., -1., 1., 1., 1.])
numpy.piecewise(a, (a < 0, a >= 0), [-1, 1]) # condlist is tuple
# array([-1., -1., -1., 1., 1., 1.])
numpy.piecewise(a, numpy.array([a < 0, a >= 0]), [-1, 1]) # condlist is array
# array([-1., -1., -1., 1., 1., 1.])
import cupy
x = cupy.array(a)
cupy.piecewise(x, [x < 0, x >= 0], [-1, 1]) # condlist is list
# array([-1., -1., -1., 1., 1., 1.])
cupy.piecewise(x, (x < 0, x >= 0), [-1, 1]) # condlist is tuple
# array([-1., -1., -1., 1., 1., 1.])
cupy.piecewise(x, cupy.array([x < 0, x >= 0]), [-1, 1]) # condlist is array
# array([-1., -1., -1., 1., 1., 1.])
import dpnp
ia = dpnp.array(a)
dpnp.piecewise(ia, [ia < 0, ia >= 0], [-1, 1]) # condlist is list
# array([-1., -1., -1., 1., 1., 1.])
dpnp.piecewise(ia, (ia < 0, ia >= 0), [-1, 1]) # condlist is tuple
# array([-1., -1., -1., 1., 1., 1.])
dpnp.piecewise(ia, dpnp.array([ia < 0, ia >= 0]), [-1, 1]) # condlist is array
# array([-1., -1., -1., 1., 1., 1.])
"Execution queue is not compatible with allocation queue."); | ||
} | ||
|
||
const bool is_result_c_contig = result.is_c_contiguous(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think condition
should also be C-contig
to keep the kernel running correctly
In [1]: import dpnp
In [2]: import numpy
In [3]: a = numpy.arange(8).reshape(2,2,2)
In [4]: cond = numpy.array(a>3, order="F")
In [5]: a_dp = dpnp.array(a)
In [6]: cond_dp = dpnp.array(cond, order="F")
In [7]: numpy.piecewise(a, [cond], [1])
Out[7]:
array([[[0, 0],
[0, 0]],
[[1, 1],
[1, 1]]])
In [8]: dpnp.piecewise(a_dp,[cond_dp],[1])
Out[8]:
array([[[0, 1],
[0, 1]],
[[0, 1],
[0, 1]]])
"Condition and result arrays must have the same dimension."); | ||
} | ||
|
||
if (!dpctl::utils::queues_are_compatible( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use check_queue
from common validation utils?
} | ||
|
||
const bool is_result_c_contig = result.is_c_contiguous(); | ||
if (!is_result_c_contig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
check_c_contig
if (!is_result_c_contig) { | ||
throw py::value_error("The result array is not c-contiguous."); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for no memory overlap?
if isinstance(condlist, dpnp.ndarray) and condlist.ndim in [0, 1]: | ||
condlist = [condlist] | ||
elif dpnp.isscalar(condlist) or ( | ||
dpnp.isscalar(condlist[0]) and x.ndim != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to raise an error like ValueError
if condlist
is empty?
dpnp.piecewise(x, [], [1])
# output
IndexError: list index out of range
f"with {condlen} condition(s), either {condlen} or {condlen + 1} " | ||
"functions are expected" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please handle this case as well
x = dpnp.ones(3)
dpnp.piecewise(x, [x>0], dpnp.array([[1,2,3]]))
# output
RuntimeError: Unable to cast Python instance of type <class 'dpnp.dpnp_array.dpnp_array'> to C++ type '?' (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)
In this PR,
dpnp.piecewise
is implemented.